Testing General Relativity using Bayesian model selection: Applications to 
observations of gravitational waves from compact binary systems 
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Second generation interferometric gravitational wave detectors, such as Advanced LIGO and Ad- 
vanced Virgo, are expected to begin operation by 2015. Such instruments plan to reach sensitivities 
that will offer the unique possibility to test General Relativity in the dynamical, strong field regime 
and investigate departures from its predictions, in particular using the signal from coalescing bi- 
nary systems. We introduce a statistical framework based on Bayesian model selection in which 
the Bayes factor between two competing hypotheses measures which theory is favored by the data. 
Probability density functions of the model parameters are then used to quantify the inference on 
individual parameters. We also develop a method to combine the information coming from multiple 
independent observations of gravitational waves, and show how much stronger inference could be. 
As an introduction and illustration of this framework - and a practical numerical implementation 
through the Monte Carlo integration technique of nested sampling - we apply it to gravitational 
waves from the inspiral phase of coalescing binary systems as predicted by General Relativity and a 
very simple alternative theory in which the graviton has a non-zero mass. This method can trivially 
(and should) be extended to more realistic and physically motivated theories. 

PACS numbers: 04.80.Nn, 02.70.Uu, 02.70.Rr 



I. INTRODUCTION 

General Relativity (GR) has so far passed every ex- 
perimental test with flying colours. Yet, a whole range 
of efforts is under way to put Einstein's theory under 
even more intense experimental scrutiny in the com- 
ing years Amongst these, highly sensitive gravi- 
tational wave experiments arc opening new means to 
probe gravity in the dynamical, strong-field regime 0. 
Ground-based gravitational-wave laser interferometers, 
LIGO d, Q and Virgo @ have now reached a sensitivity 
that could plausibly lead to the first direct detection of 
gravitational waves. The upgrade of these instruments 
to the second generation (also known as advanced con- 
figuration) is already under way; Advanced LIGO @ and 
Advanced Virgo 01 are expected to start science obser- 
vations by 2015, and to provide a wealth of detections 
from a variety of sources [8l-[l0|. As soon as a posi- 
tive detection is achieved, one can surely expect that 
gravitational-wave data will be used to test the predic- 
tions of General Relativity, see P, [Ij for recent reviews. 
In the longer term future, very-high sensitivity laser in- 
terferometers, both space-based - such as the Laser In- 
terferometer Space Antenna (LISA) [ll| and Decigo [l^ 
- and ground-based third-generation instruments, such 
as the Einstein gravitational-wave Telescope (ET) [H- 
flBj . will increase our ability to test alternative theories 
of gravity. 

Tests of General Relativity through gravitational wave 
observations have already been discussed in several stud- 
ies [l6l - [37j . Of particular interest are observations of 
the coalescence of binary systems, as they allow us to 



probe the dynamic, highly relativistic and strong-field 
regime through the precise monitoring of the amplitude 
and phase evolution of gravitational waves, that can be 
accurately predicted in General Relativity and alterna- 
tive theories of gravity. 

The conceptual approach that most of these studies 
have employed to investigate the ability to test General 
Relativity is based on the computation of the expected 
accuracy of the measurements of the unknown parame- 
ters that characterise the radiation, including those in- 
troduced by alternative theories of gravity; moreover, the 
statistical errors are estimated using the inverse of the 
Fisher information matrix [s^. These results are useful 
as approximate figures of merit for the constraints that 
can be expected to be derived from observations. How- 
ever, they suffer from various conceptual (and practical) 
limitations that we address in this paper. The first is that 
at the conceptual level, those studies do not actually ad- 
dress whether observations will be able to discriminate 
an alternative theory of gravity from General Relativity. 
They simply assume that the true theory of gravity is 
different from General Relativity, and by computing the 
expected statistical errors on the unknown signal param- 
eters (including those that encode deviations from GR) 
make some statements on whether or not the observations 
have sensitivity to the relevant parametcr(s). [This leaves 
aside the issue that the computation of the variance- 
covariance matrix simply provides a lower bound, the 
Cramer- Rao bound [3^, |40| , to the variance of the sta- 
tistical errors, which is a meaningful bound on the ac- 
curacy of parameter estimation only in the limit of high 
signal-to- noise ratio, see e.g. Refs. (4l| - |4^ . Secondly, 
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those studies ignore what would be the consequences of 
(small) deviations from GR, such that detections can still 
be achieved using GR waveforms when the actual the- 
ory of gravity differs from GR, but parameter estimation 
can be affected due to the use of the "wrong" waveform 
model; this issue has been recently discussed in Ref. [s^l 
and termed "bias in gravitational wave astronomy" . In 
addition to this, past studies do not take into account 
the fact that one can take advantage of the (hopefully) 
many detections to provide better constraints on alter- 
native theories of gravity by combining the observations. 
In fact, although each astrophysical source will be char- 
acterised by different astrophysical parameters (such as 
masses and spins, in the case of coalescing binaries), if 
there is a deviation from GR described by some "global" 
fundamental parameters of the new theory that are the 
same for every system - e.g. the gravitational radiation 
dispersion velocity is affected in such a way that one can 
associate the graviton a mass different from zero, cor- 
responding to a specific value ~ then one can combine, 
see e.g. Ref. [i^ all the observations and obtain better 
constraints. Finally, there has been no actual attempt 
to provide a method to address these issues in an anal- 
ysis that can be implemented in practice and therefore 
move towards using actual gravitational wave data to 
place constraints on theories of gravity. 

In this paper we tackle these specific issues: we in- 
troduce a conceptual and practical approach within the 
framework of Baycsian inference to discriminate between 
different theories of gravity by performing model selec- 
tion, and to estimate the unknown model parameters. 
The approach based on Bayesian inference is particu- 
larly simple and powerful as it provides both a statement 
on the relative probabilities of models (a given alterna- 
tive theory of gravity versus General Relativity) and on 
the distribution of the unknown parameters that charac- 
terise the theory, the (marginalised) posterior probabil- 
ity density functions (PDFs). The conceptual simplic- 
ity of Bayes' theorem is balanced by the computation- 
ally expensive A'^-dimensional integral (where N, usually 
^ 10, is the total number of the unknown parameters) 
on which this theorem relies for the calculation of the ev- 
idence and the marginalised PDF's. Several integration 
techniques, such (Reversible Jump) Markov-chain Monte 



Carlo methods [461, |47 
and Nested Sampling 



thermodynamic integration [48[ 
49| have been explored in a range 
of fields to tackle this computational challenge. For our 
analysis we use a nested sampling algorithm that some 
of us have developed for applications in the context of 
observations of coalescing binaries with ground-based in- 
struments (soj . and that has been shown to allow such 
A^-dimensional integrals to be computed in an efficient 
and relatively computationally inexpensive way (Hol . [52| - 

The method that we present here is completely general 
and can be applied to observations of any gravitational 
wave source and any alternative theory of gravity. How- 
ever, for the purpose of introducing and illustrating the 



method; in this paper we focus specifically on the case 
of a "massive graviton" theory -i.e. a theory of gravity 
in which the boson mediating the gravitational interac- 
tion is characterised by a rest-mass Trig different from 
zero, and the corresponding Compton wavelength of the 
graviton Ag is finite (the GR case corresponds to m.g = 
and Ag — > oo) - and consider how one would go about 
testing GR against this alternative theory in a statisti- 
cally rigorous way, by providing both a conceptual and 
practical (in the sense that is readily applicable to grav- 
itational wave observations) approach to this problem. 
The reason for choosing a massive graviton theory for 
our proof-of-concept analysis is two-fold: (i) the gravita- 
tional radiation emitted by coalescing compact binaries 
in a massive graviton theory takes a particularly simple 
form characterised by only one additional unknown pa- 
rameter - the Compton wavelength of the graviton Ag 
- and therefore provides an ideal proof-of-concept case 
to study, and (ii) several studies have already explored 
the feasibility of placing new limits on the graviton mass 
using gravitational wave observations, and it is therefore 
useful to compare those expectations with actual results 
from a rigorous statistical analysis performed on mock 
data sets. Clearly the analysis that we show here can be 
applied to any other theory of gravity. 

The paper is organised as follows. In Section|lT]we dis- 
cuss the statistical method that we employ. In Section 
Hm we present the models we use as a test case for our 
study, and introduce the gravitational waveform gener- 
ated by in-spiralling compact binaries in General Rela- 
tivity, and in a "massive graviton" theory. Sections IIVI 
and IVl contain the main results of the paper: in Section 
IIVI we show the results of our analysis for the observa- 
tion of a single gravitational wave signal; in Section |V] 
we derive a method for combining multiple observations 
which are expected from advanced interferometers, and 
we show how effective it can be for the specific example 
at hand to further constrain Ag. Finally in Section IVll we 
summarise our work. 



II. METHOD 

Let us consider a set of theories of gravity {Hj}, in- 
cluding General Relativity that we want to test using 
observations of gravitational waves emitted during the 
coalescence of binary systems of compact objects, either 
black holes or neutron stars. Each theory makes a predic- 
tion on the gravitational waveform h{t]6), that depends 
on the specific theory Hj and a set of unknown param- 
eters 9. The statements that we can make on any given 
theory is based on a data set d (observations) and all the 
relevant prior information / that we hold. 

Within the framework of Baycsian inference, the key 
quantity that one needs to compute is the posterior prob- 
ability of a given theory (a "model" or hypothesis) Jij. 
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Applying Bayes' theorem we obtain 

p{n,\i)p{d\n„i) 



P{H,\d,I)^ 



p{d\i) 



(1) 



where P{'Hj \d, I) is the posterior probability of the model 
Jij given the data, P{'Hj\I) is the prior probability of 
hypothesis Hj, and P{d\'Hj, I) is the marginal likelihood 
or evidence for Hj that can be written as: 

p{d\n,j)^c{n,) 

= f d9p{e\n,,i)p{d\e,nj,i). (2) 



In the previous expression p^OlHj,!) is the prior proba- 
bility density of the unknown parameter vector 9 within 
the theory Hj and p{d\9,Hj,I) is the likelihood func- 
tion of the observation d, assuming a given value of the 
parameters 9 and the theory Tij . 

If we want to compare different models - for this pa- 
per, we concentrate on General Relativity versus an al- 
ternative theory of gravity - in light of the observations 
made, we can compute the relative posterior probabili- 
ties, which is known as the odds ratio 
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where P{T-Li ) / P{T~Li) is the prior odds of the two hypothe- 
ses, the confidence we assign to the models before any 
observation, and Bi^ is the Bayes factor. Here, we are 
interested in what the data can tell us about the relative 
probabilities of two models, and so we will not involve 
the prior odds any further. 

In addition to computing the relative probabilities of 
different theories, one usually wants to make inference 
on the unknown parameters, and therefore one needs to 
compute the joint posterior probability density function 
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(4) 



From the previous expression it is simple to compute the 
marginalised PDF on any given parameter, say 9i within 
a given theory of gravity Hj 



p{9i\d,n 



d9Np{9\d,HjJ). (5) 



The key quantities for Bayesian inference in Eq. ([3]), ^ 
and ([5]) can be efficiently computed using e.g. a nested 
sampling algorithm [HI . In this paper use a specific im- 
plementation of this technique that we have developed for 
ground-based observations of coalescing binaries, whose 
technical details are described in Ref. ISOll. 



III. MODELS 

In this Section we introduce and review the gravita- 
tional waveform approximations that we use, and spell 
out the models that we consider in the proof-of-concept 
analyses, whose results are presented in Sections ITVl 
and El 



A. Gravitational waveforms 

In General Relativity, gravitational waves generated 
during the in-spiral of compact binary systems are accu- 
rately modelled using the post-Newtonian approach, see 
e.g. Ref. jsoj for a review. Here we use the standard 
restricted post-Newtonian approximation to the radia- 
tion, computing directly the waveform in the frequency 
domain by taking advantage of the stationary phase ap- 
proximation. The amplitude of the radiation contains 
therefore the leading order Newtonian contribution and 
higher order post-Newtonian terms are retained only in 
the phase. We further consider the expansion to post^- 
Newtonian order and we assume that the compact ob- 
jects have no spins. In summary, we will consider the 
frequency domain GW signal in the General Relativity 
case to be 
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is the amplitude of the signal from a source at luminos- 
ity distance Dl, and the phase ^'gr(/) at the post^- 
Newtonian order is given by [57l |: 



*gr(/) =2^/t,-$. 
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In the previous expressions / is the GW frequency, 

M = mi + 771,2 
77ll77l2 



V 
M 



M2 



(9) 
(10) 

(11) 



arc the total mass, the symmetric mass ratio and the 
"chirp" mass, respectively, of a binary of components 
masses mi and m2. Lastly, $c is the constant phase 
at some (arbitrary) reference time tc- 

In a theory of gravity in which the propagation veloc- 
ity of gravitational radiation is different from the speed 
of light, Wg 7^ c and depends on the frequency / as 
(wg/c)2 = 1 — (c//Ag)2, there is a very simple imprint on 
the phase of the radiation, as shown by Will in Ref. [l^ ■ 
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This is equivalent to the effect that a graviton (the par- 
ticle mediating the gravitational interactions) with non- 
zero mass (or a finite size graviton Compton wavelength) 
would induce. Following the accepted convention in the 
literature, we will can this a "massive-graviton (MG) the- 
ory" . In this paper, we will consider the result derived in 
Rcf. [Tsj for the expression of the waveform emitted by a 
binary system in a MG theory: 



(12) 



The amplitude is the same as for the General Relativity 
case, Eq. and the gravitational- wave phase becomes 



«'mg(/) = *gr(/) 



TT^DM 

xm + z) 



(^A//)-i , (13) 



where z is the cosmological redshift of the source. It 
is useful to notice that the frequency dependency of 
the massive-graviton term is proportional to (ttM/)"^, 
which is the same as the first post-Newtonian contribu- 
tion, see Eq. ([5]). The distance D that appears in the 
massive-graviton phase term (|13p is given by: 



D = 



l + z 



dz' 



Ho Jo (l + z')\/^M(l + 2'3)+f^A 



(14) 



where we have assumed a flat universe, and Hq, D,m and 
r^A are the Hubble parameter, the matter and cosmo- 
logical constant density parameters, today; in this paper 
we will use values consistent with the standard ACDM 
cosmology, Hq — 70 km s^^ Mpc^^, Hm — 0.3 and 
r^A — 0.7 [HHl. We note that D in general differs from 
the luminosity distance Dl that enters into the GW am- 
plitude, Eq. ([7]), by a factor (1 = z)~^. However, in 
this paper we concentrate on sources within 150 Mpc, 
z < 0.04 observed with ground-based laser interferome- 
ters with sensitivity typical of second generation. With 
these assumptions, D differs from by less than 5% and 
we set the two numbers to be the same, and ignore the 
difference, and therefore present the results directly as a 
function of Ag. In real observations, z cannot be mea- 
sured from gravitational wave observations alone, and 
therefore one cannot measure independently D and Ag: 
one can only refer to the combination oc D/[Xl{l + z)]. 
None of these assumptions have an impact on the concep- 
tual approach that we propose in this paper. However, 
some of the results on the specific cases considered here 

- an alternative theory characterised a massive graviton 

- are affected by the approximation. During the actual 
analysis of the data one would use the most accurate ex- 
pression of the waveform available. 



B. Hypotheses 

In this paper we use consider tests of General Rela- 
tivity versus a massive-graviton theory, according to the 



assumptions that we have discussed in the previous Sec- 
tion. Here we explicitly state the models (or, equiva- 
Icntly, hypotheses) that we consider and are codified with 
the following notation: 

• ^GR- The data consists of (zero mean) Gaussian 
and stationary noise of known spectral density plus 
an inspiral signal of the form described by Eqs. ©- 
(El) with A-i = 0. 

• Hmg- The data consists of (zero mean) Gaus- 
sian and stationary noise of known spectral den- 
sity plus an inspiral signal of the form described 
by Eqs. (|12p - (fT3)) . but with Ag as an additional un- 
known free parameter. 

In our analysis, for both models "Hgr and "Hmg we ac- 
tually compute the Bayes factors between the hypothesis 
that the data contain noise and signal, according to a 
given theory, versus the hypothesis that the data contain 
only only noise. We call these Bayes factors i?GR. noise 
-Smg, noise, for a signal vs noise within General Relativity 
and a massive-graviton theory, respectively (see Section 
II, and in particular II. D of Ref. [501 for more details). 
The Bayes factor between the MG and the GR models 
_Bmg,GR is then calculated simply as 



B 



MG.GR 



B 



MG, noise 



GR, noise 



(15) 



see Eqs. dS]) and (l2|). 



IV. RESULTS 

In this Section we present the results pertinent to the 
analysis of simulated gravitational wave observations us- 
ing individual detections. In Section |V] we shall then 
generalise our approach to the combination of multiple 
gravitational-wave events to produce more constraining 
statements on alternative theories of gravity, and in the 
specific case considered in this paper a massive graviton. 
In this exemplificative study, we shall use as reference 
sensitivity the one of second generation ground-based in- 
struments, such as Advanced LIGO [59| . 

We begin in Section IIVBI by discussing how to distin- 
guish between the MG and GR models using Bayesian 
model selection. We consider data sets in which coalesc- 
ing binary waveforms are present and are generated using 
MG waveforms - we inject these waveforms into Gaus- 
sian and stationary noise for a fixed massive graviton 
wavelength and binary system parameters - and analyse 
the data using both MG and GR waveforms to calculate 
the value of the Bayes factor Smg.gR; Eq. ([T5|) . In Sec- 
tion |IV2] we then discuss the effect of analysing the data 
with GR waveforms - focussing specifically on the issue 
of parameter estimation - in the case in which the actual 
correct theory of gravity is represented by a massive- 
graviton theory; we therefore address the issue of bias 
(in particular) as well as statistical errors in parameter 
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estimation which, except the pseudo-analytical approach 
in Ref-dH, cannot be addressed (and has not been ad- 
dressed) solely with Fisher Information Matrix calcula- 
tions. Finally, as the value of the graviton wavelength Ag 
is already tightly bound by Solar System tests and mea- 
surements of galaxy motions in clusters of galaxies [l| , we 
consider the case in which the correct theory of gravity is 
General Relativity (Ag — > oo) and investigate how to set 
bounds on Ag and how tight these constraints can be ex- 
pected to be. In order to do so, we inject GR signals and 
analyse the data sets using massive-graviton waveforms 
to compute bounds on Ag from the marginalised PDF of 
Ag that the analysis yields. 

Before discussing the results in Sections IIV BIIIV Pj and 
Section |V] we provide details about the simulations on 
which our analyses are based. 



A. Details of the simulations 

Our results are derived from a set of simulations on 
synthetic data sets analysed using a nested sampling al- 
gorithm that we have developed for ground-based obser- 
vations of coalescing binaries (soj . Here we provide de- 
tails on the parameters of the simulations that are used 
throughout the paper. 

We generate Gaussian and stationary noise in the fre- 
quency domain with noise spectral density representa- 
tive of second-generation instruments [s^ and superim- 
pose ("inject", in the jargon of the LIGO Scientific Col- 
laboration) frequency domain gravitational-wave inspi- 
ral signals corresponding to either the General Relativ- 
ity model, Eqs. ([SI)- ([8]), or the massive graviton model, 
Eqs. (HH) and ([13]). Observations to gain information on 
the graviton dispersion relation require the use of at least 
three instruments at geographically separated locations 
to fully break the distance-source position determination, 
see Eq. p^ . However, in order to reduce the computa- 
tional costs of the simulations, that are in general signif- 
icant, we actually use data sets from a single instrument 
and assume the source sky location is known within a 
small area in the sky. As it is possible to recover the sky 
location when using a network of three or more interfer- 
ometers, this constraint simulates the effect of using a 
network but with only one dataset to be analysed at a 
time and therefore has considerable advantages in terms 
of computational time. 

We inject inspiral signals from binary systems with 
chirp mass M. = 5Mq and symmetric mass ratio 77 = 0.15 
- the individual component masses are mi = 2.9Mq and 
TO2 = 12.7Mq - and fixed, but random polarisation, in- 
clination and sky location. In order to explore how the 
results depend on the signal-to- noise ratio (SNR), we re- 
peat the injections considering different distances. Note 
that if one increases the distance to the source, the gravi- 
tational waveforms is affected in two ways: the amplitude 
decreases (and so does the SNR), and the contribution 
of the massive-graviton term to the phase increases, see 



Eq. ((T3)) . Unless otherwise stated, we generated MG 
waveforms using a value of the parameter Ag = 10^^ 
m. The actual present lower bound on Ag, which is in- 
ferred from Solar System experiments, is Ag > 2.8 x 10^^ 
m [1^ . The purpose of this section is the investigation of 
our method's behaviour and its feasibility as a rigorous 
but practical way of performing tests of GR, and so we 
have ignored the existing physical bound of the graviton 
Compton wavelength here. The magnitude of the non- 
GR term in Eq. (fT^ is in fact oc A~^, thus to have a 
significant contribution from this term we need to have 
small values of A,,. 

In the analysis, the prior distributions on the param- 
eters are chosen as follows. We use uniform priors on 
the chirp mass in the range 1 < A4/Mq < 15, in ?/ over 
the range 0.1 — 0.25, in Dl over the range 1 — 1500 Mpc. 
The prior on the the polarisation, and initial phase angles 
arc flat and cover the whole range, and is proportional to 
I sin i| for the inclination angle l over the whole range too. 
The prior on the signal coalescence time is flat within a 
range ±100 msec centred on the true value. The situ- 
ation where the signal position is reconstructed with a 
instrument network, but we are performing simulations 
using a single instrument is approximated by a prior on 
the source right ascension and declination that is flat in 
the angles and has a width of 0.1 radians around the 
actual value of the injection. 

When we analyse the data considering the MG model, 
one of the parameters of the signal is of course Ag. The 
functional form that we choose for the prior on Ag is the 
scale invariant prior 

p(Ag|HMG) oc ifAr-)<Ag<Ar^'^\ ^^g^ 

I elsewhere. 

We choose the lower bound of the prior range as a|™'"'' = 
10^^'^ m and we analyse the data using different upper- 
limits in the range lO^^ ^^m < Ag < lO^^'^m. The 
prior (|16p is effectively non-informative on the scale 
of Ag between its bounds [g^I, and also implies that 
p(log AglHtvic) = const. In the numerical integration, we 
actually consider log Ag as massive-graviton parameter as 
it is more easily sampled over this large a range. 

The analysis of the simulated data sets is carried out 
using the nested sampling algorithm that we have devel- 
oped for ground-based observations of coalescing binary 
systems, with a total of 1 000 "live points" for each anal- 
ysis. Technical details about the method are available 
in Ref. [HO] and the software is released as part of the 
lalapps_inspnest program, which can be found in the 
LIGO Scientific Collaboration LALApps software distri- 
bution [U. 

The stochastic nature of the noise affects the actual 
value of the Bayes factor that is recovered by the analysis, 
which varies from noise realisation to noise realisation for 
fixed values of the signal injection parameters. In several 
instances, in the results that we provide below we average 
over the noise realization, that is we inject the same sig- 
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FIG. 1. The Bayes factor of the massive-graviton vs General Relativity model Bmcgr, see Eq. (|15[l obtained by analysing 
data sets containing a massive-graviton waveform as a function of the signal-to-noise ratios (SNR) of the injection and the 
prior range on Ag. For each value of the injection and analysis parameters, 10 data sets with independent noise realisations 
are generated and processed in order to quantify the variation of Bmg.gr produced by the stochastic nature of the noise. The 
black contours show the median value, while the white contours show the 95% confidence interval of logBMG,GR- For all the 
injections, the mass parameters are M. = 5Mq, rj = 0.15 and Ag = lO^^m. The location and orientation of the source in the sky 
were drawn randomly and kept fixed in each run, and the distance changed in order to produce different signal-to-noise ratios. 
The lower-bound of the prior massive-graviton wavelength is kept constant in all the analyses and set to logjQ(Ag™'"'/m) — 14.5; 
as upper-bound we consider in turn six values (spaced by an order of magnitude), from logj^Q(Ag™^'''/m) = 15.5 to 20.5 and are 
shown on the vertical axis. The Bayes factor decreases with the increase of the prior width because the evidence, Eq.ljSJ, is 
smaller when integrated over a larger volume in the parameter space. 



nal on many different and statistically independent noise 
realisations and we report (some measure of) the spread 
of the Bayes factors. We also note that we have chosen 
the tunable parameters of the nested sampling algorithm 
used in the analysis such that the fluctuations due to the 
noise dominate any fluctuation of the Bayes factor values 
due to the Monte Carlo nature of the technique. 



B. Selecting a theory 

We first consider the problem of discriminating be- 
tween a massive-graviton theory and General Relativity. 
In order to do so, we generate signals from binary sys- 
tems with mass parameters M = 5Mq and ij = 0.15, as 
we have described in the previous Section. We fix the 



value of the Compton wavelength to Ag = lO^^m, and 
in order to explore the dependency of the results on the 
signal-to-noise ratio (SNR) we repeat the injections us- 
ing a different value of the luminosity distances, to span 
the range 5 ^ SNR ^ 20. We also generate and analyse 
10 data sets with independent noise realisations for fixed 
injection parameters. In the analyses, the prior on Ag is 
given by Eq. (jl6p : however, for each data set the analysis 
is repeated for six different ranges of p^XglHuc) i with 
^(min) _ 2^q14.5jj^ (which is kept constant), and Ag"^'*^'' is 
changed by an order of magnitude each time to span the 



range Ag 



10 



15.5 , 



10 



20.5, 



The results of the analysis are summarised in Figure [TJ 
where we plot the Bayes factors Smg,GRj see Eq. ([3]), as 
a function of the optimal SNR of the injection and the 
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upper bound of the prior on the Compton wavelength. 
For each value of the prior range and the SNR, i?MG,GR 
shows a range of values (in fact we plot max(_BMG,GR) 
and min(i?MG,GR)j the maximum and minimum Bayes 
factor that are computed, respectively, due to the effect of 
the different realisation of the noise. The results indicate 
that, for any given prior range on Ag, the Bayes factors 
^MG,GR is, on average, a monotonic increasing function 
of the SNR. At the same time, for a given SNR, -Bmg.GR 
is a monotonic decreasing function of the prior range on 
Ag. To understand this last behaviour, consider that the 
integral in Eq. Q is performed over the whole parameter 
space. For a given likelihood, the effect of enlarging the 
prior on Ag is then to decrease the evidence. Nevertheless, 
we note that, in our simulations, log-BMG,GR, favours the 
MG model for SNR ^15 and for priors on Ag that span 
at most two orders of magnitude. 

A byproduct of our analysis are the marginalised 
PDFs, see Equation ([5]), of the parameters characteris- 
ing the relevant model. Figure [5] shows the marginalised 
PDFs for selected parameters for a specific injection (and 
noise realisation) with SNR = 14.4. For such system the 
total number of wave cycles produced by the massive 
graviton term in Eq. (|13p within the sensitivity band is 
27. The prior range on Ag is lO^^-^m < Ag < lO'^'^-^m. For 
the specific noise realisation, we obtained log -Bmg, noise = 
81.7 and the logBcR.noise = 65 giving log^MG.GR = 18. 
The signal is clearly detected using both the MG and 
the GR model, however the MG model is favored over 
GR by sa 65, 000, 000 to 1. Using the MG model, the 
parameters are correctly recovered, which is not the case 
for the GR model, where a significant bias is observed 
(see top- right panel of Figure ^ . This is an issue that 
has not been considered so far by other studies and will 
be discussed in detail in Section HV CI 

When the width of the prior on Ag is large enough to 
dominate the integral ^ and therefore the Bayes fac- 
tor does not favour the MG model, the posterior density 
function on Ag if one assumes the MG model shows a 
characteristic behavior, which is shown in Figure [3] the 
analysis excludes the region of the log]^g(Ag) prior that 
would give raise to an appreciable effect, ruling out the 
lower parts of the prior distribution. 



C. Parameter estimation and bias 

The assumption that General Relativity is the correct 
theory of gravity may affect parameter estimation (and 
therefore inference) if the actual theory is different. The 
impact is clearly not only restricted to the statistical er- 
rors with which the parameters are recovered but may 
also lead to bias. One might think that if a non-GR sig- 
nal is detected with GR templates, then the effects on 
the estimation of the parameters is negligible. However, 
this is not necessarily true, as we have already shown 
in Figure [5] (top-right panel): the in-spiral binary sig- 
nal is clearly detected (log Bgr. noise = 65), but the mass 



estimates are unambiguously biased; in fact the signal 
present in the data follows the massive graviton model. 

The particular aspect of bias is very important, as it 
introduces a conceptually different source of error - a 
systematic offset from the true value which docs not fade 
away as the signal-to-noisc ratio increases - that has com- 
pletely been ignored so far. In fact, all the work on so- 
called parameter estimation in the context of other theo- 
ries of gravity has been carried out so far using the Fisher 
information matrix formalism that assumes that there is 
no bias and deals only with statistical errors. The result 
that we have just highlighted in Figure [5] shows that this 
is a key issue. Using the MG model as an example of de- 
viation from General Relativity, we show in this section 
the possible effects. Of course every theory of gravity 
would lead to different results; in the future, it is there- 
fore important to study each one of them individually. 
For the specific case of a massive graviton theory, we 
note that the phase contribution of Ag is proportional to 
{■jTMf)~^, see Eq. it scales as the General Relativ- 

ity post^-Newtonian contribution, which is controlled by 
jy. Therefore we expect that the parameter that will be 
mostly affected by processing the data with a GR wave- 
form will be the symmetric mass ratio rj (and in turn 
the estimate of the individual mass components). The 
bottom- left panel of Figure [5] shows precisely the degen- 
eracy (or correlation) between Ag and rj (clearly in the 
case in which the data are analysed assuming the model 
MG). 

In order to explore these effects, we perform a set of 
numerical experiments in which we inject MG waveforms 
according to Eq. and at fixed SNR and mass 
parameters (M = 5Mq, rj = 0.15 and SNR = 21) and 
vary the value of the graviton Compton wavelength of the 
signal in the interval 10^'*-^m < Ag < lO^^m, using the 
values reported in Table [J For a fixed signal-to- noise ra- 
tio (or Dl), it is useful to notice that for Ag ^ 3.2 x 10^^ 
m (and for the parameters of the injections) the num- 
ber of wave cycles Mmg to which the massive graviton 
term contributes in the detection band (i.e. for frequen- 
cies above 20 Hz, in this case) drops below 1. As usual, 
for each set of injection parameters we generate 10 in- 
dependent noise realisations. In addition, the data are 
analysed three times, using three different prior ranges 
for Ag, see Eq. p6)) : the lower bound of the range is kept 

fixed to Ag™'""* — 10^'^'^m and the upper-bound is set to 
^^max) ^ iQi^-^m, lO^S-^m and lO^O-^ni, in turn. 

We explore the dependency of the Bayes factor 
log i?MG,GR on the value of Ag and we investigate whether 
in analysing the data assuming the GR model, the esti- 
mates of the unknown parameters are affected. 

Figure 3] summarises the results, as a function of 
Ag. The analyses of the data yield log Smg, noise and 
log iJGR, noise always greater than unity and in general 
^ 1 showing that the signal hypothesis is clearly favored 
over the noise hypothesis, and therefore the signal is "de- 
tected" (left panel of FigH]) . This happens regardless of 
the choice of the prior on Ag, and of the model (MG 
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FIG. 2. Marginalised posterior density functions of selected parameters obtained by analysing a data set in which a massive- 
graviton signal was injected with parameters M. = 5.OM0, = 0.15 (giving mi = 2.9M0 and 7712 = 12.7Mq) and Ag — lO^^m 
at a distance oi Dl = 60Mpc. Sky position and orientation of the orbital angular momentum are randomly chosen and yield 
an optimal SNR of 14.4. The corresponding number of wave cycles in the sensitivity band associated to the massive graviton 
term is 27. The prior on the massive graviton is flat in log^Q(Ag), and non-zero for logjQ(Ag/m) G [14.5,20.5]. The analyses 
yield log Bmg, noise = 83, log Sgr, noise = 65 and therefore log iJiviccR = 18. Clearly the signal hypothesis is preferred to the 
noise-only hypothesis using both GR and MG waveforms. However, the Bayes factor is clearly in favor of the MG model rather 
than the GR model. Top left panel: The marginalised posterior PDFs of mi and m2 obtained by analysis the data with MG 
model, p(mi, m2 |d, "Hmg). Note that the distribution is peaked at the true value of the injection parameters. Top right panel: 
The marginalised posterior PDF of mi and m2 obtained by analysis the data with GR model, p(mi, m2|ci, Wgr). The posterior 
PDF now is offset from the true value of the parameters and shows a bias in the recovery of the parameters. Bottom left panel: 
The marginalised posterior PDF of 77 and Ag obtained by analysis the data with the MG model, ^(77, Ag|d, Hmg). Bottom right 
panel: the marginalised posterior distribution of Ag using the MG model, p(Ag|d, Hmg). When the data are analysed with the 
MG model (which corresponds to the injected waveform model) the signal parameters are correctly recovered. 



or GR). Furthermore, log i?MG, noise is found in the in- 
terval 200 < log BMG.noisG < 250, whereas log BcR.noisc 
increases from « 70 for Ag = 3.16 x lO^'' m to the same 
values of log Bmg, noise for Ag ^ 10^^ m. In fact, when 
Ag exceeds « 10^^ m (for this specific choice of masses 
and signal-to-noisc ratio), logBMG.OR ~ and there is 
no conclusive evidence from the data that the MG model 
should be preferred to the GR model. This behavior is 
consistent with what one would expect: as Ag increases, 
the MG waveform looks progressively more as a GR wave- 
form, and this is reflected in the behavior of the Bayes 



factors. Eventually the Occam's Razor takes over and 
naturally the simplest theory is favored (if there are no 
strong reasons to prefer the more complex one). 

The recovered median value of rj varies with the two 
models used to analyse the data, and within the MG 
model it varies with the prior chosen on Ag. For the 
largest prior size, lO^^^-^m < Ag < lO^O-^m, MG be- 
haves similarly to GR (right panel of Fig|4]). When the 
size of the phase terms that are neglected in the model 
template is large, therefore for Ag ^ 10^^ m (c/. Ta- 
ble U for our choice of parameters), the best estimate 
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FIG. 3. The prior and marginalised posterior probability 
density function of log\Q(Ag) for the analysis of the same 
injection parameters used for Figure (2] In this analysis 
however the prior distribution is non-zero over the interval 
14.5 < logjQ(Ag/m) < 20.5. For this particular noise re- 
alisation log Bmg, noise = 82 while log Bgr, noise = 81. The 
charcoal coloured histogram is an example PDF for log\Q(Ag) 
when log -Bmg,gr does not favor the MG model because of the 
prior width. The injected waveform has M — 5AIq, = 0.15 
and Ag = lO^'^ m, with optimal SNR of 14.4. The light grey 
histogram is the prior distribution of logjQ(Ag). 
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TABLE I. Total number of wave cycles from the massive gravi- 
ton phase term as a function of Ag for the mass parameter 
values used in the injections (Ai — 5AIq and 77 = 0.15) and 
a low-frequency cut-off /low = 20 Hz, to which results in Fig- 
ure |4] refer. 



of rj is severely biased (although the signal is clearly 
detected using just GR templates). Consider the con- 
tribution to the total number of cycles in the detector 
band from the massive graviton phase term contribu- 
tion Nmg- When Nmg ^ 1, the GR waveform does 
not differ significantly from a MG waveform, so small 
or no systematic effects are introduced in the param- 
eter recovery. When Afna ^ 1, the GR template has 
no way to distinguish the phase shift due to the mas- 
sive graviton term from the post^-Newtonian term, see 
Eq. ([5]). Consequently, the only way in which the GR 
model template can match the observed MG signal is 



to lower the value of rj to account for the phase shift. 
Needless to say, even if the rj PDF is strongly peaked, 
the best estimate of 77 is actually quite different from the 
injected value (notice that logBGR,noisc ^ !)• With- 
out the knowledge of the "true" model describing the 
observed data, even in case of a successful detection in- 
ference on the parameters may be affected. For the inter- 
mediate prior, 10^'^ '^m < Ag < 10^^ '^m (empty triangles 
in Figure ^ , the MG model manages to recover the cor- 
rect value of the parameter 77. In this case, no bias is 
ever introduced. When we consider the smallest prior, 
lO^^'-^m < Ag < 10^^-^m (open diamonds in Figure H]), 77 
is correctly estimated for Ag < 2 x 10^^-^m, corresponding 
approximately to the upper boundary of the prior itself. 
When the injected value of Ag is greater than the upper 
bound of the prior probability distribution function, rj is 
instead overestimated. Our estimate is biased, but this 
time this is in the opposite direction compared to what 
we observed before. 

We further illustrate the subtleties associated to the 
bias in Figure O For this we generate a single noise re- 
alisation and inject a MG waveform with M = 5Mq, 
■q — 0.15 and Ag = lO^^m. The data set is then anal- 
ysed using three different priors over the graviton Comp- 
ton wavelength (same functional form as in Eq. (|16p 
but different intervals over which the prior is non-zero): 
10"-^ m < Ag < lO^^m, lO^^m < Ag < lO^^-^ m and 
^015.75 < A°g < lO^^-^^m (note that only the last in- 
terval contains the value of the injected signal). In Fig- 
ure [5] we show the 2-dimensional marginalised posterior 
PDFs on the mass parameters and the graviton Comp- 
ton's wavelength. The bias on the recovered values of Ai 
and 77 is a clear function of the distance between the in- 
jected value and the upper bound of the prior chosen to 
analyse the data. For ^4 the recovered value differ from 
the real one only by few percents. Ai is in fact mainly 
determined by the leading Newtonian term in Eq. (|13p . 77 
and Ag are instead anti-correlated in (jl3p . therefore when 
Ag is forced to be smaller than the actual injection value, 
rj is pushed toward the highest acceptable values set by 
its own prior. It is important to notice that in the three 
cases, log Bmg, noise IS essentially the same, therefore it 
cannot be used to prefer one choice of the prior over an- 
other. Nevertheless, it is possible to infer the inadequacy 
of a prior by checking for signs of railing against the prior 
boundary. 

The simple examples presented in this Section are 
merely indicative of a possible wider problem when grav- 
itational wave astronomy begins to take place. Severe 
bias in the parameter(s) recovery may be introduced by 
the choice of the model, cf. Figure |4l and priors, cf. 
Figure [S] 



D. Bounds 

The results of the previous section demonstrate that 
if an MG theory represents the behavior of gravity, it is 
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FIG. 4. Left panel: The Bayes factor of the signal over noise-only hypotheses assuming the MG model (black filled circles, 
open diamonds and open triangles) and GR (red open squares) as a function of the value of Ag (in meters) of the injected 
signal. In the MG case each symbol corresponds to a different prior range on Ag shown in the bottom-right corner of the plots. 
All the injections are carried out for a signal with M = 5Mq and r] = 0.15, and distance and angular parameters selected in 
such a way to produce an optimal signal-to-noise ratio of 21.6. It is clear from the fact that log -Bmg, noise and logBcR.noise are 
S> 1 that the signals are unambiguously detected. As expected the MG theory is (strongly) favoured for small values of Ag, 
but for Ag Si 10^^ m (for this specific choice of masses, distance and signal-to-noise ratio) log _Bmg.gr ~ 0, thus the data do 
not provide any conclusive evidence in favour of the MG theory. Right panel: The median of the maximum likelihood value of 
the symmetric mass ratio -q as recovered by the MG model and the GR model. The symbols are the same as the left panel. 
Each point is the result obtained by averaging over 10 independent nested sampling realisations. The error bars represent the 
combination of the 95% probability intervals from each run. The results described by the solid circles correspond to parameters 
correspond to a value of eta in the GR model that gives a number of cycles that is as close as possible to the MG (for the 
signal's injection parameters). Note that although log\BMG,GR ~ for 10^® ^ Ag/m ^ 10^®, and therefore the MG model is 
not favored over GR, the value of rj recovered by the GR model is systematically biased to compensate for the additional phase 
shift due to the mass of the graviton that the GR model can not account for properly. 



unlikely that in the near future gravitational wave ob- 
servations will be able to provide sufficiently stringent 
observational results to tip the odds in favour of sueh 
a theory. On the other hand, even if the correct the- 
ory of gravity is characterised by a massless graviton, it 
is interesting to investigate what limit on Ag one could 
place experimentally. This can be addressed in straight- 
forward way in Bayesian inference; it simply requires the 
evaluation of the marginalised posterior density function 
p(Ag|d, Hmg)! from which one can compute a lower limit 
on Ag corresponding to a given probability P: 




d\^p{x^\d,nMG) = P- (17) 



In our case we decide to set (arbitrarily) P = 0.95 and 
therefore compute the 95% lower limit on the graviton 
Compton's wavelength that we label Ag^"'^. In order to 
explore this point, we first assume that GR is the cor- 
rect theory of gravity. We then produce 100 indepen- 
dent injections for sources with the same physical pa- 
rameters (tW = 5 Mq and r] = 0.15) and luminosity dis- 
tance but different location/orientation in the sky (drawn 
uniformly on the two-sphere) so as to produce a range 
of optimal signal-to-noise ratios. The inspiral waveform 
used to generate the injection is the GR waveform de- 
scribed by Eqs. ^ and ([B]), and we analysed the data 



using the Massive Graviton model, Eqs. ((T2|) and (fT3)) 
with a prior p(Ag|-HMG) with 10"-^m < Ag < lO^^-^m. 
From the marginalised PDF of Ag we compute the 95% 
lower limit, by setting P — 0.95 in Eq. P7)) . The results 
are shown in Figure [6] Advanced LIGO could there- 
fore put a tighter constraint on the value of Ag than the 
one currently inferred from the observation of the orbit 
of Mars. Also, Bayesian model selection allows slightly 
tighter bounds on Ag compared to Fisher Information 
Matrix based studies. For values of the parameters sim- 
ilar to the ones we used, the typical lower limit from 
second generation instruments such as Advanced LIGO 
is of the order of few x lO^^ m [H, |3l|. 



V. COMBINING MULTIPLE DETECTIONS 

Beyond the present challenge of directly detecting 
gravitational waves for the first time, second and third 
generation instruments are expected to detect an increas- 
ing number of signals in the coming years [m^. There- 
fore, it is imperative that we exploit the information 
that multiple observations can bring in a statistical way. 
Bayes' theorem offers the possibility of combining the re- 
sults from each individual observation in a conceptually 
simple way. In the context of testing theories of grav- 
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FIG. 5. The effect of tlie priors on tlie marginalised posterior density functions of selected parameters. A massive-graviton 
signal was injected with M — 5Mq, 77 = 0.15 and Ag = 10^''m, producing an optimal SNR of 21.6. The same data set is 
analysed three times changing only the prior range of the graviton's Compton wavelength (everything else being the same): 



10" 



< Ag/m < 10'^ lO''^ < Ag/m < lO''^ '^ and lO'-^'^^ < Ag/m < lO' 



Left panel: marginalized posterior PDF of 77 and 



Ag. Right panel: marginalized posterior PDF of M and Ag. In both panels, the vertical dotted lines show the limits of the 
priors used and the cross marks the injected values of the parameters. The bias in parameter estimation is clearly a function 
of the distance between the "real" (injected) value and the upper bound of the prior on Ag. 
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FIG. 6. The 95% lower limit on the graviton Compton wave- 
length Ag^'^, see Eq. ((T7|, in observations with second gener- 
ation ground-based instruments of inspiral signals from bina- 
ries with chirp mass M = 5 Mq and ?7 = 0.15 modeled using 
General Relativity waveforms. The histogram shows the rel- 
ative frequency of the lower-limit from 100 injections with an 
optimal signal-to-noise ratio in the range 5 < SNR < 25. By 
comparison the Solar System bound on the graviton Compton 
wavelength is Ag > 2.8 x lO^'^ m 



ity, this may be particularly powerful if a given theory 
is characterised by some "global parameters" - e.g. the 
Compton wavelength of gravitons - that are independent 
of the actual gravitational wave signal at hand. In this 
case one can construct posterior density functions that 
take into account all the data available and therefore 
strengthen the inference process. For the specific case 



considered in this paper, we will consider how one can 
set more stringent lower limits on the graviton's Comp- 
ton wavelength using observations of a number of coa- 
lescing binaries each of which with different parameters. 
Specifically in our case we consider the example of infer- 
ring Ag from the combined probability distribution from 
multiple, independent, observations. 

Let us assume that we have a set of N independent 
observations, di, . . . ,dN, in which a gravitational wave 
signal from a coalescing binary is detected (here N should 
not be confused with number of dimensions of the signal's 
parameter vector introduced in Section |lT]) . We want 
to estimate the marginal PDF of Ag from the joint set of 
observations. From Baycs' theorem we can write: 

p{Xg\di,...,dN) cxp(Ag)p(di,...,dAr|Ag), (18) 

and from the chain rule, 

p{di, dN\Xg) = p(di|Ag, ^2, . . . , dN)p{d2, . . . , dwIAg) . 

(19) 

Since the observations are independent, Eq. (|19p simpli- 
fies to 

p(di|Ag,d2...,dAr) (xp(di|Ag), (20) 
and in general we can write 

N 

p{Xg\di,...,dN) cxp(Ag) J|p(d,;|Ag), (21) 

i=l 

where 

p(d,|Ag) = J d0pi9)p{d,\0,Xs) (22) 
is the marginalised likelihood for the ith observation. 
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A. A pedagogical example 

One can first develop some intuition about the benefits 
and power of this approach by considering a simple case, 
in which the relevant functions that enter the computa- 
tion of p(Ag|(ii, . . . , (In), Eq. ((2T|) . have simple analytical 
forms; this case is also useful to disentangle conceptual 
issues from practical ones related to the discrete nature 
of the probability density functions with which one deals 
in practice, and that we shall address in the next Section. 

The results of Section IIVBI show that in the case in 
which the correct theory of gravity is General Relativity, 
the posterior PDF p(log]^o(-^g)M) well approximated 
by a sigmoid function, sec Figure |3l Let us therefore 
consider a set oi i = 1 , . . . ,N observations each of which 
yields a posterior PDF is of the form 

1 



P(logio(Ag)M*) oc 



(23) 



1 + aie-''''°Sio(>'g) ' 

where at and bi are real numbers that differ from obser- 
vation to observation. We can simulate the outcome of 
N observations on a range of binary systems by simply 
selecting the values of a,; and bi and then combine the 
results by using Eq. ([2T|). In fact, for the specific choice 
of prior on Ag that we consider here, we have 



p(d,|Ag) cx 



p(logio(Ag)Mi) 
p(Ag) 



P(logio(Ag)|4 



(24) 



observotions 



FIG. 7. The 95% lower limit on the graviton Compton wave- 



length Ag for an example of individual and combined (in- 
dependent) observations of inspiral binaries. The plot shows 
A^^'^ for each individual observation (solid circle) and for the 



combined set of observations (solid line), following Eq. (|21|l 
as a function of the number of observation. The posterior 
PDF on Ag from which the single and combined result are 
obtained is assumed to be of the form (|23|l . with the coeffi- 
cients a and b randomly drawn from uniform distributions in 
the range a £ [2, 10] and b G [1, 10]. 

where, differently from everywhere else in the paper, 
we assumed a flat prior distribution on Ag. As an ex- 
ample, we can assume that we have, say, 50 detections of 



coalescing binaries, each of which coming from a different 
source and a different SNR, that leads to a different 95% 
lower limit on the graviton's Compton wavelength Ag^^" , 
by randomly drawing values of a and b to construct the 
PDFs (|23p . Following this procedure, we have generated 
50 distributions according to Eq. ([25]) . For each of them 
we compute the 95% lower limit on Ag, based on a single 
observation, and the same quantity using the combining 
set of observations, from Eq. ([21]). Figure [7] summarises 
the results. It shows a monotonic increase in the value of 
the 95% lower limit calculated from the combined PDF, 
compared to each single 95% lower limit. It is interesting 
to note that how much we know about Ag (here quanti- 
fied by Ag^^") never decreases as more information (data) 
are collected. If a "bad" observation is recorded (with an 
individual limit on Ag significantly worse than the rest of 
the data), this leaves the overall 95% limit constant. 



B. Combining observations in practice 

In practice, we are dealing with a finite number of sam- 
ples from the posterior distributions relevant to each de- 
tection, rather than with an analytical function. This in- 
troduces certain complications when producing the com- 
bined PDF on the Ag parameter, which we have tackled 
using the procedure that we describe below. The tech- 
nique that we propose here is applied to the specific ex- 
ample of the graviton's Compton wavelength but can be 
applied to any parameter in any theory. 

To state the problem, we wish to find an appropri- 
ate approximation to the posterior PDF for each obser- 
vation which we can multiply together in order to get 
the combined PDF, Eq. through Eq. A typ- 

ical approach is to categorise the samples into a series 
of bins, creating an histogram which approximates the 
underlying PDF. A histogram m is a set of k integers, 
m = {mi, . . . , ruk), which register the number of samples 
falling into k independent categories. As a histogram ap- 
proximates a probability distribution, which we know is 
normalised, we can approximate the probability in each 
bin by dividing each count by n = Yli=i '^j- This pro- 
cess can introduce a large amount of noise to the dis- 
tributions, as there are likely to be very few posterior 
samples falling into the bins of low probability density, 
which occurs near the region of the 95% limit (or any 
other large probability interval) in which we are inter- 
ested. A naive approach to combining these histograms 
would be to multiply the (normalised) results in each bin, 
but if one of the histograms contains zero samples in a 
bin, then all combined results produced from this distri- 
bution will also register zero in that bin no matter how 
many samples may appear there in the other histograms 
used in the combination. This problem arises because of 
the treatment of the normalised count in each bin as if it 
were the actual probability in that bin. In fact, given a 
particular histogram, we can calculate the actual proba- 
bility distribution for the probability in each bin, using 
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the Dirichlet distribution. This allows us to avoid the 
probability in any one bin going to zero, although that 
may still be the most likely value given the histograms. 

The probability of obtaining a given histogram of k 
bins from a set of n items is governed by a multinomial 
distribution: 



P(m;p) = 



mi! • • • mfc! 



n = m,; 

i=l 



(25) 
(26) 



Thus we can alternatively describe an histogram using 
the probabilities p = {pi, . . . ,pk) associated with each 
bin of the histogram itself. The probability distribution 
of the probabilities entering the definition of the multi- 
nomial distribution is described by the Dirichlet distri- 
bution. In fact the probability density for the variables 
p ~ (pi, . . . ,pk) given the histogram m = (mi, . . . , m^) 
is 



p(p; m) = 



1 



Z(m) 



(27) 



where pi,. . . ,pk > 0, J2i=iP^ = 1 ™i' ■ • ■ i"^* > 0- 
In this context, the parameters m^ can be interpreted 
as "prior observation counts" for events governed by pi. 
Furthermore, in the limit m^ = the Dirichlet distribu- 
tion is a conjugate non-informative prior for the multi- 
nomial distribution. The normalisation constant .^(m) 
is given by 



Z(m) 



nil r(m.) 



(28) 



where T is the gamma function. We use the Dirichlet 
distribution to calculate the probability pi, . . . ,pk associ- 
ated to each of the k bins given the current histogram m. 
Thanks to this procedure no bin is ever assigned a zero 
probability. Therefore, to calculate the combined PDFs, 
Eq. pTj) . we multiply "probability histograms", i.e. the 
Dirichlet distributions, describing the probabilities asso- 
ciated with each bin given each histogram, instead of 
the multinomial distributions corresponding to each his- 
togram. The combined PDF can also be thought as the 
joint probability distribution of the probability that each 
bin has to receive a future point, given all the previous 
observations. 

An alternative approach to a very similar problem is 
presented in Ref. j45l |. However, the method presented 
there may be unsuitable to estimate the probability den- 
sity functions of the parameters in regions of the parame- 
ters space where they are close to zero. Given the typical 
behavior of p(Ag|c?) {e.g. Fig. [3]), these regions are critical 
for our goal, see Eq. p7)) . We stress again that the tech- 
nique that we implement here does not rely on any form 
of approximation or smoothing of the original, discrete 
PDF. 
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FIG. 8. The 95% lower limit on logj^g(Ag) using the detection 
of 50 inspiral signals. The plot shows Ag^^ for each individual 
observation (red squares) and for the combined observations 
(black solid circles). Notice that the combined lower limit 
does not change much after 10 detections of approximately 
the same quality (i.e. same lower limit on Ag) are combined. 



We have applied the method described above to a set 
of 50 observations of gravitational waves modelled us- 
ing General Relativity and added, in the same way as 
reported in the previous Section, to Gaussian and sta- 
tionary noise following the spectral density of second 
generation instruments. The parameters of the injec- 
tions were chosen randomly to create a distribution of 
signal-to-noise ratios, and we have analysed the data us- 
ing the MG model. In the analysis the prior range for 
p(Agl'HMG), see Eq. (HH), was 14.5 < logio(Ag/m) < 20.5. 
For this specific simulation we used a higher number of 
live points in the nested sampling algorithm than be- 
fore, specifically 10^, in order to generate more accu- 
rate PDFs, in particular in regions in which the PDF 
is close to zero. Figure E] shows the result of the analy- 
sis, where both the 95% lower limit on Ag from each in- 
dividual observation and the combined result, obtained 
by applying Equation (j2ip . are reported. The results 
confirm the behavior that one expects and that we have 
shown using the proof-of-concept example described in 
Section IV Al and summarised in Figure [T] Given the dis- 
crete nature of the process, the sampling of the origi- 
nal PDFs might influence the results presented in Fig|8l 
We tested the robustness of our combination method 
against this effect by studying the scatter introduced on 
Ag^^° by randomly resampling the nested sampling out- 
put chains which yield the marginalised posterior PDFs. 
For a given single observation, we randomly re-sample 
the output chains of the nested sampling algorithm to 
produce 100 marginalised PDFs of Ag. Using the pro- 
cedure that we have described earlier in the Section, for 
each of the marginalised PDFs we evaluate logio(Ag^'^°), 
and then compute the sample mean of the single obser- 
vation logiQ(Ag^^°)'s and the spread of these values, in 
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■ combined observotions 



FIG. 9. The shaded region is the la confidence region over 
the 95% lower hmit on log\Q(Ag) obtained combining 100 re- 
ahsations of the single observation PDFs. The order of the 
observations is held fixed. The circles are the 95% lower lim- 
its on log^Q(Ag) for each single observations. The error bars 
are the la confidence interval of each point. 



particular we consider the interval over which 68% of the 
logj^Q(Ag^'^'')'s fall; virith slight abuse of terminology, we 
call this the "Icr interval". We repeat this procedure for 
each of the individual detections. For a fixed order of 
the observations, we then draw randomly a value of Ag^^° 
coming from the single observation and produce the com- 
bined 95% limit on Ag; we repeat the procedure 100 times 
to compute a sample mean and a Icr interval in the same 
way as above. The results are summarised in Figure ID 

The results of this test confirm that our procedure is 
robust. We do observe some variations in the combined 
value of log2o(Ag^^°), but this is restricted to « 0.1. More- 
over, the trend of the mean is consistent with that pre- 
sented both in Figures [7] and [8l and show a clear improve- 
ment (well beyond the uncertainty region) with respect 
to single observations. We are therefore confident in the 
histogram method of combining multiple independent ob- 
servations of a constant underlying parameter; it can be 
applied to any experiment detecting even a small number 
of gravitational wave events. 

Our study already shows a significant improvement on 
the value of Ag^*^" after the combination of only 5 PDFs 
of Ag, which turns out to be larger (by a factor of a 
few) than the value that any single experiment yields. It 
is therefore clear that the ability of collectively using the 
results from many detections shall provide a powerful tool 
to tighten the observational constraints on the relevant 
parameters and physical quantities. 



VI. CONCLUSIONS 

We have developed a statistical framework and an 
analysis approach to discriminate among theories of grav- 



ity using gravitational waves observations of binary sys- 
tems. This is a general framework and analysis pipeline 
that can be straightforwardly extended to any source. 
Our analysis strategy is based on Bayesian inference; us- 
ing a nested sampling algorithm, we can compute the 
evidence of each theory and then the Bayes factors be- 
tween them. As a by-product of the analysis, we are also 
able to compute marginalised posterior probability func- 
tions of each model parameter. This approach not only 
provides a rigorous mathematical quantification of the 
relative probability of two (or more) theories of gravity 
given prior information and observations, but also allows 
the exploration of correlations and bias on the process 
of parameter estimation in addition to purely statistical 
errors. We are able to quantify the bias introduced in 
the analysis if the assumed model does not actually cor- 
respond to the "real world" one. 

In this framework, we have also introduced the com- 
bination of the results from observations of an arbitrary 
number of sources as a natural extension of the method 
that enables more powerful inference and therefore bet- 
ter constraints. We consider the fundamental conceptual 
issues, and the practical ones that derive from dealing 
with manipulating sampling distributions with few data 
points. 

As a proof-of-principle, we have applied this analy- 
sis strategy to simulated data sets from second gener- 
ation instruments containing gravitational waves gen- 
erated during the inspiral of non-spinning compact bi- 
nary systems in General Relativity and in a "massive- 
graviton" theory of gravity. The focus of this paper has 
been the methodology, rather than the ability of future 
experiments to test massive-graviton theories, due to the 
simplifications that we have made on the gravitational 
waveform. However, we have derived results that com- 
plement past studies that are solely concerned with esti- 
mating the expected statistical errors using approxima- 
tions of the variance-covariance matrix. In fact, the cal- 
culation of the evidence suggests that second generation 
instruments in general will not have sufficient sensitivity 
to favor a massive-graviton theory over General Relativ- 
ity, given the existing constraint on the Compton wave- 
length of the graviton Ag > 2.8 x lO^^m. However, we 
have shown that they may put a limit on Ag that is a 
few times more stringent than current bounds. Further- 
more, the combination of multiple detections will lead to 
significant improvements in such a bound. 

This study should be regarded as the starting point for 
more comprehensive investigations, that extend beyond 
the simple case of non-spinnining inspiral waveforms and 
tackle in a comprehensive fashion the theories of grav- 
ity that are considered viable alternatives to Einstein's 
theory. It is essential that these aspects (including an 
end-to-end analysis pipeline to be used by the time ad- 
vanced instruments are on-line) are fully addressed and 
we plan to come back to some of them in the future. 
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